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Abstract In this work, we select the high signal-to-noise ratio spectra of stars from the 
LAMOST data and map their MK classes to the spectral features. The equivalent widths of 
the prominent spectral lines, playing the similar role as the multi-color photometry, form 
a clean stellar locus well ordered by MK classes. The advantage of the stellar locus in line 
indices is that it gives a natural and continuous classification of stars consistent with either 
the broadly used MK classes or the stellar astrophysical parameters. We also employ a 
SVM-based classification algorithm to assign MK classes to the LAMOST stellar spectra. 
We find that the completenesses of the classification are up to 90% for A and G type stars, 
while it is down to about 50% for OB and K type stars. About 40% of the OB and K 
type stars are mis-classified as A and G type stars, respectively. This is likely owe to 
the difference of the spectral features between the late B type and early A type stars or 
between the late G and early K type stars are very weak. The relative poor performance 
of the automatic MK classification with SVM suggests that the directly use of the line 
indices to classify stars is likely a more preferable choice. 
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1 INTRODUCTION 

The classification of the normal stars plays important roles not only in the understanding of the stellar 
physics, but also in the study of the overall structure and evolution of the Milky Way. MK classification 
(Morgan &. Keenan 11973b is one of the most broadly used systems based on the spectral features of 
a small number of standard stars. Compared to the mapping of the stars directly using the effective 
temperature, surface gravity, and chemical abundances, the MK classification is simple and effective. A 
usual procedure to process a spectrum in a spectroscopic survey is to firstly assign the MK classes to 
the spectra and then to estimate the stellar astrophysical parameters using the MK classes as the start 
points (e.g. Luo et al. 12015b . Spectral classifications are also very helpful in the targeting for the follow¬ 
up studies. For instance, in order to select the blue horizontal branch stars from the whole dataset, one 
might firstly select all A type stars to reduce the size of the sample; in order to study the circumstellar 
environment of the young massive stars, one needs to firstly select the OB type stars from the full sample; 
or in order to search for the AGB stars, one has to firstly select the M giant stars from the database. 

Alternatively, the stars can be classified based on the color indices. Now-a-days, billions of stars 
have accurate multi-band photometry covering from UV to infrared bands, e.g. GALEX (Bianchi et 
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al. IIOTTI) . SDSS (A hn et a l. Iml . PanSTARRS (Tonry et al. l20T2l l. 2MASS (Skrutskie et al. l2006l l. 
WISE (Wright et al. 12010b etc. which provide abundant information of the stellar astrophysical param¬ 
eters. For instance, Covey et al. (12007b mapped the stars with different types in SDSS-I-2MASS multi¬ 
color space and showed a clear continuous stellar locus, on which any reasonable stellar classifications 
can be set up, including the well known MK class system. The biggest advantage of the continuous 
stellar locus in color space is that it naturally reflects how the spectral energy distribution varies with the 
stellar astrophysical parameters, such as the effective temperature, the surface gravity, and the metallic- 
ity. Therefore, it is very important in either the researches on the stellar evolution or the overall features 
of the Milky Way. In fact, in the context of the survey with millions to billions of stars, the color-based 
stellar locus may be more effective and straightforward in the stellar classifications (see their applica¬ 
tions in Yanny et al. 120001 Majewski et al. 120031 Yanny et al. l2009l etc.) However, when one goes into 
deep sky, especially along the Galactic mid-plane, most of the photometric color indices of stars are 
reddened by the absorption and scattering of the interstellar medium. Although some remarkable works 
has been done (Schlegel et al. 119981 Schlafly et al. 120131 Chen et al. 12014b . the knowledge of the 3 
dimensional reddening distribution of the Milky Way is still very limited, leading to certain systematics 
varies with lines of sight and distances in the multi-color index-based stellar classifications. Moreover, 
in most cases, the color index is an integration of the spectrum over a wide range of wavelength, details 
showing in the spectral lines are smoothed out. Therefore, in general, color index-based classifications 
of stars cannot completely take the place of the spectra-based classifications. 

Most of the known MK types of the stars are classified by manually comparing the spectra with a 
small sets of standard stars (e.g. the samples in Corbally, Gray & Garrison ll994b . which is not efficient 
when the sample is huge and not always reliable. Although efforts has been made to automize the MK 
classification by developing an automatic software (e.g.. Gray & Corballv l2014b . it is still a non-trivial 
task since the real stellar spectra are not only sensitive to the effective temperature and luminosity, but 
also dependent on the elemental abundances. Moreover, in a large spectroscopic survey, the spectra may 
be in the wide range of signal-to-noise ratio, making the spectral features not always as clear as the 
small set of well observed high-quality standard spectra. For such a large spectroscopic survey, the mis- 
classification in the template-matching techniques based on some standard stars may be significant for 
low signal-to-noise ratio data. And it may subsequently affect the effort of the searching of the peculiar 
and rare objects. 

Other efforts of the star classification based on automatic algorithms have been done in the past 
twenty years by various works. These algorithms include metric-distance technique (e.g. LaSala [r994b . 
artificial neural networks (e.g. Bailer-Jones 12007b . fuzzy logic methods (e.g. Carricajo et al. 12004b etc. 
It is noted that Bailer-Jones et al. (12008b and Saglia et al. (12012b reported the applications of the support 
vector machine (SVM) in the star-galaxy-QSO classifications. In general, this new technique can also 
be used for the classification of stars. 

Recently, the LAMOST survey (Cui et al. 120121 Zhao et al. 120121 Deng et al. 12012b has been 
collected more than 4 million stellar spectra in its 2nd internal data release (DR2). Unlike SDSS, the 
LAMOST survey does not have the combined photometry survey with its spectroscopic one, for which 
the targets are selected from several external photometric catalogs (Carlin et al. 120121 Yuan et al. 12014b . 
This makes it difficult to establish the star classification based on the photometric color indices since 
the multiple input catalogs are not well calibrated. With only the stellar spectra, it is not trivial to au¬ 
tomatically classify the stars into different MK types. The LAMOST pipeline (Luo et al. 120121 Luo et 
al. 12015b runs a cross-correlation based algorithm (correlation function initial; CFI) to assign the MK 
types to each stellar spectra. However, due to some technical issues (e.g., the noises in the spectra, the 
interstellar extinction distortion in the continua, and the limit of the synthetic library used in CFI etc.), 
this classification, which has already appeared in the LAMOST catalog, is not very reliable, especially 
for O, B, A, and M type stars. Therefore, a robust and reliable automatic classification method suitable 
for all spectral classes for the LAMOST spectra is anxiously required. 

In this work, we map the MK classes to the space of the indices of the prominent spectral lines in the 
spectra. The line indices naturally form a stellar locus from the hottest to the coolest stars because the 
smooth transition of these spectral lines with the effective temperature and surface gravity of the stars. 
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In principle, unlike the broadly used discrete MK classes, the line indices can automatically provide a 
continuous classes, although the elemental abundance may broaden it. Meanwhile, the MK classes or 
other class systems can be easily mapped to the line index space to find their counterparts. We also 
employ the SVM to assign the MK classes for the stars and compare it with the line indices-based 
classification. We suggest that the line indices-based classification is one of the most robust ways to 
classify the stars in the era of large data. 

The paper is organized as below. In section 2, we give a brief introduction of the LAMOST survey 
and the data selection for the classification. We also give the detailed definition of the indices of more 
than 20 spectral lines in the rest of this section. In section 3, we show the features of the stellar locus in 
the space of the line indices and how the locus associates with the MK classes. In section 4, we employ 
the support vector machine to classify the stars to MK types. We then compare the stellar locus-based 
with the SVM-based MK classification. We raise discussions in section 5 and draw the short conclusion 
in section 6. 


2 DATA 

2.1 LAMOST and SIMBAD data 

The LAMOST telescope, also called Guo Shou Jing telescope, is a 4-m reflected Schmidt telescope with 
4000 fibers configured on the 5-degree field of view (Cui et al. 120121 Zhao et al. 120121 1. The LAMOST 
Milky Way survey will finally targets more than 5 million stellar spectra with resolution of 1800 in 
its 5-year observations (Deng et al. 120121 Liu, X-W. et al. 120141) . It seems that it can obtain more spectra 
than the schedule after the LAMOST team released the DR2 catalog, which contains about 4 million 
stellar spectra, by the end of 2014. 

We select about 1.52 million stellar spectra with signal-to-noise ratio larger than 20 (which means 
both the averaged signal-to-noise ratio at g and i band are larger than 20) to investigate how the spectral 
features vary with the star classes. In order to identify their MK classes, we cross identify them with the 
SIMBAD catalog (Wenger et al. I2000| [*l and obtain 3,134 spectra of normal stars with MK classification 
flags in the SIMBAD catalog. TabIe[T]shows the distribution of the MK classes for the sample. It shows 
that the sample is distributed in the MK classes in significant imbalance. The stars between late B and 
early A and the G, K type stars are prominent in the sample. And the main-sequence stars is much more 
than the giant stars, while the supergiant stars are very rare. 

2.2 Line indices 

In order to associate the star classes with the spectral features, we measure the line indices of spectral 
lines instead of using the full spectra. In general, line indices do not request the flux calibration, which is 
very hard to calibrate in the LAMOST pipeline due to the complicated instrument, e.g., 4000 fibers with 
different length, 16 spectrographs with slightly different performances etc. They are also very robust 
against the random noise. Although the sky background is very difficult to be cleanly subtracted from 
the spectra, the blue part of the spectra are less influenced. Fortunately, most of the well known line 
indices, e.g. the Lick indices (Worthey et al. 119941 Worthey & Ottaviani ll997l) are in blue. 

The principle of the selection of the spectral lines is two-fold. First, the lines should be strong 
enough that can be effectively detected in the low resolution spectra. Second, the lines should be sen¬ 
sitive to the effective temperature, surface gravity, and metallicity so that they can play roles in the 
classification. Table |2] lists all 27 spectral lines used in this work. Most of them are adopted from Lick 
indices (Worthey et al. 119941 Worthey & Ottaviani ll9971 Cohen, Blakeslee & Rvzhov [l9981 l. To better 
separate OB type stars we add three Helium lines. And since Call K line may also be often used for 
classification, it is also considered based on the definition by Beers et al. (I1999I I. 
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We define the line index in terms of equivalent width (EW) with the following equation (Worthey 
etal. [T994l) : 

EW = f (1) 

J Jconty^) 

where fcont{^) and /ime(A) are the fluxes of the continuum and the spectral line, respectively, both of 
which are functions of the wavelength A. The continuum fcont is estimated via linear interpolation of 
the fluxes located in the pseudo-continuum region on either side of each index bandpass (see table |2]i. 
The line index under this definition is in A. It is noted that the measurement of the equivalent widths of 
the lines is based on the rest-frame spectra, in which the radial velocities have been corrected. The value 
of the radial velocity is adopted from the LAMOST catalog. For the spectra with signal-to-noise ratio 
larger than 20, the median uncertainty of the equivalent widths of the lines are smaller than 0.1 A. 

Fig[T]shows the median equivalent width of different spectral lines for each class of stars. It is evi¬ 
dent that the spectral lines are not equally sensitive to the stellar classes. All Balmer lines, i.e., H^, H^, 
H.^, and H^, well separate the classes. The Magnesium lines are also sensitive to the classes, particularly 
for late type stars. Although the iron lines change not as significant as the Mg lines, they also show clear 
trend in different classes. Finally, the TiO lines are very sensitive to the M type stars. It seems that lots 
of the spectral lines are correlated. Hence, we do not need to use all of them for the classification. We 
select as the representative Balmer line, since it has the largest amplitude of variation among the 
Balmer lines. Then we average over the Mgi, Mg 2 , and Mgb as the composed line index of Mg. We also 
average over all 9 iron lines as the composed line index of Fe. Finally, we select G band (CH) and Ti02 
to represent for the molecular bands. In total, we give 5 (composed) line indices for all selected stars. 

Although Call K line is frequently used in classifications and parameterizations, we decide not to 
use it because it does not provide extra information about spectral types and is located at around the blue 
end, in which the wavelength calibration and the efficiency of the instrument are not as good as other 
lines, making the line index of Call K be not very stable. 

3 LINE INDICES-BASED CLASSIFICATION 

Fig|2]shows the stellar loci in the space of the 5 line indices, Mg, Fe, G band, and Ti02 for all 1.5 
million selected stars (only show their distributions in blue contours). The unit of the x- and y-axes is 
A. The hollow circles with the neighboring dark gray labels mark the median positions of the main- 
sequence MK classes from the SIMBAD catalog. For instance, a hollow circle with a label “G2V” is 
the median value of the stars with the type GOV, GIV, G2V, and G3V. And a symbol with “G5V” is 
the median of all G3V, G4V, G5V, and G6V stars. The neighboring circles overlap with each other by 
one decimal subtype in order to make the stellar locus smoother. Similarly, the red asterisks with the 
neighboring labels indicate the locus of the giant star MK classes. The detailed positions of these circles 
(asterisks) for main-sequence (giant) stars are listed in Table [3 (Table 01 . Fig[3zooms in to the smaller 
regions for better illustration of the early type stars. 

First, the stars from O to M type can be well separated and ordered in vs. G4300 plane, shown 
in the top-right panel of Fig|2] In vs. Fe plane (top-left panel), the stars from O to G type are well 
separated, while the M and K type stars overlap at the top of the stellar loci and hard to be disentangled. 
Similar trend is shown in vs. Mg plane in the middle-left panel of Fig|2] However, the late type 
main-sequence stars are well disentangled in Mg vs. Fe, Fe vs. G4300, and Fe vs. Ti02 planes shown in 
the middle-right, bottom-left, and bottom-right panels, respectively, while the early type stars in these 
planes are clumpy and hard to be separated from each other. Combined with the 5 line indices, we are 
able to separate all types of main-sequence stars from O to M type. 

Second, the separation of the luminosity type works well for K and M giant stars. Especially in Mg 
vs. Fe (middle-right panel of Fig|2l) and Fe vs. Ti02 (bottom-right panel) planes, the cool star ends of 
the stellar loci of main-sequence and giant stars go to different directions. In Mg vs. Fe plane, the locus 
of the main-sequence stars goes down toward smaller Fe and larger Mg indices in the coolest end, while 
the locus of the giant stars goes up toward larger Fe but smaller Mg indices. Similar trend can also be 
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seen in Fe vs. Ti02 plane. However, It is very hard to disentangle the early type giant stars, e.g. B, A, 
and F type giant stars. These types of giant stars are located at almost exactly the same position as the 
same type main-sequence stars. According to Gray & Corbally (120091 1. some weaker lines, such as Oil 
(at 4070, 4076, 4348, and 4416A), SIIV at 4116A etc., may be helpful to discriminate the luminosity 
types for B type stars. However, they are very weak in the low-resolution LAMOST spectra and may be 
significantly affected by the noise. 

It is worthy to point out that the variation of Fe index for the late type stars are mostly not exactly 
related with the Fe lines, but significantly affected by the prominent molecular bands, e.g. TiO, happened 
to overlap at the same wavelength. And the response of the Mg index to the cool stars are actually 
dominated by the MgH band. 

It is also noted that the dispersions shown in Fig |2] are not only contributed by the uncertainties 
of the line indices, which is only about 0.1 A. The dispersions may be intrinsic and related with the 
broad diversity of the metallicity. In this paper, we mainly focus on the effective temperature, which 
corresponds to the spectral types, and the surface gravity, which is related with the luminosity. The 
effect of the metallicity in spectral classification may be more complicated, because it also reflects the 
evolution of the different stellar populations. We would like to leave this topic in future works. 

The classification of the stars based on the stellar loci can be done by looking up the lines indices 
in Tables [3 and 0] For any statistical study of the Milky Way, one can conveniently select stars located 
in a segment of the stellar loci in Figs |2] and [3 according to the marked MK classes. Compared with 
the classical MK classes, the stellar loci in line indices, acting just like the color indices in a multi¬ 
band photometric system, provide natural and continuous sequences of the stars, which are easier in 
quantitative statistics. More discussions can be found in section|3 

4 SVM-BASED CLASSIFICATION 

Alternatively, we can also translate the line indices-based stellar loci into MK class system for individual 
stars. To do this, we employ a SVM algorithm to automatically assign the proper MK class to a stellar 
spectrum. 

SVM is a supervised machine learning algorithm for classification and regression (Cortes et 
al. 119951 1. In general, a supervised algorithm uses a small sample with the multi-dimensional input vari¬ 
ables and known labels of classes as the training dataset. The SVM classification is built in two steps. 
First, with the training dataset, the optimized non-linear boundaries among different classes in the input 
space is determined and defined by a subset of the training dataset, which is called the support vectors, 
located around the boundaries. Second, for a given input data, the trained SVM model gives a prediction 
of the class depending on where the input data is located with respect to the support vectors. A typical 
sample of SVM classification can be found in Liu et al. (I2014I I and a sample of SVM regression can be 
found in Liu et al. (120121 1 and Liu et al. (120151 1. _ 

Chang & Lin (120111) provides a multi-programming language package, LIBS V1V0, to implement the 
SVM algorithm. Here, we use LIBSVM to classify the stars into MK types based on the line indices. We 
arbitrarily separate the 3,134 stars with both high signal-to-noise ratio LAMOST spectra and SIMBAD 
MK types into two equal-size groups. One group is selected as the training dataset to train the SVM, 
and the other is used as the test dataset to assess the performance. We use all 27 line indices listed in 
Table 13 as the input vector. We only adopt 6 classes, which are OB, A, F, G, K, and M, and ignore the 
decimal subtypes and the luminosity types in the SVM classification. O and B types are merged as one 
class since there are only very few O type stars in the sample. 

Fig 0] shows the stellar loci composed of the ^ 1500 test dataset with color coded SIMBAD class 
labels in the space of line indices H.^, Fe, Mg, G band, and Ti02. Because we use the SIMBAD MK 
classes as the training dataset, it implies that we assume the SIMBAD MK classes as the “standard” 
classes to be compared with. Fig|3shows the similar stellar loci with the exactly same test dataset as in 
Fig01 but the colors code the SVM derived MK classes. 
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Comparison between Fig|4]and|5]can give the qualitative impression of the performance of the S VM 
classification. It is obviously seen that some OB type stars (blue pentagons) located in the bottom-right 
corner in vs. Fe and H.^ vs. G4300 planes, which are shown in the two top panels in Fig |4] are 
mistakenly classified as A type stars (cyan circles) by the SVM method, as shown in the corresponding 
panels in Fig|5] Moreover, although the SVM classification works quite well for stars from M to F type, 
it can still see the relative harder and artificial-like boundaries among F, G, and K type stars in Fig|3 

A quantitative assessment of the performance of the SVM classification is based on the so called 
confusion matrix shown in Tables |5] in which the columns stand for the “true” class labels and the rows 
stand for the SVM derived class labels. The intersections give the percentage of the stars which belong 
to the class in column but are assigned to the class in row by the SVM. The diagonal items show the 
completeness of the classification, i.e., the percentage of the stars in class X being correctly classified as 
the same class. The last column in Table|5]gives the contamination, which is the percentage of the stars 
in the derived class X being contaminated by other classes. 

Tablel^shows that A and G type stars have the highest completeness larger than 90%. It means that 
more than 90% A or G type stars are correctly classified by the SVM algorithm. The completenesses 
of F and M type stars are about 72% and 68%, respectively, which are still acceptable. However, the 
completeness for OB, K, and M stars are only about 52%, implying that almost half of these two types of 
stars are mis-classified in the SVM classifier. Indeed, about 44% “true” OB type stars are mis-classified 
as A type. And similar percent of “true” K type stars are mis-classified as G type. This is probably 
because the spectral features of the late B (early K) type stars are very similar as those of the early A 
(late G) type stars and thus they are very difficult to be disentangled in SVM. It may also because that 
the adopted “true” classes from SIMBAD database are compiled from various literatures and classified 
by eyes, and hence, not well calibrated with each other. Therefore, the large dispersions in the manually 
assigned MK classes may affect the performance of the SVM classification. 


5 DISCUSSIONS 

5.1 The discrepancy between MILES and LAMOST spectra 

In order to provide an external comparison of the stellar locus in the space of the line indices, we calcu¬ 
late the same line indices for the MILES samples (Sanchez-Blazquez et al. 120061 1. which contains 985 
bright stellar spectra with wide extensions in stellar parameters. We overlap the stellar loci of MILES 
data with red lines in Eigs |6] and |7] for main-sequence and giant stars, respectively. To be convenient, 
we also mark the averaged effective temperatures along the stellar loci as the reference. They show that 
the stellar loci of LAMOST and MILES are not completely overlapped with each other, especially for 
late type dwarf stars and all giant stars. Although, according to Tables |3] and 01 these differences are 
mostly within uncertainty of 1- or 2-a, the overall shifts of the loci of MILES in most panels of Eigs|6] 
and|7]are likely systematic. Looking back to the bottom-right panel of Eig|2] it is seen that the SIMBAD 
stellar locus for M dwarf stars show similar systematic bias from the full sample of the LAMOST data 
(the contours). Therefore, it is likely that the M dwarf stars in the SIMBAD database may be a biased 
sample and cannot represent for the majority of the LAMOST M dwarf samples. This also gives an alert 
that the line indices stellar loci derived from one survey should not be directly extended to other survey. 
Calibrations in the line indices and in the sample selection function are necessary before the extension. 

5.2 How to make the decision, the MK class or the line indices-based stellar locus? 

In the previous sections we show two kinds of classifications. The line indices stellar locus orders the dif¬ 
ferent types of stars as a simple sequence, along which the effective temperature monotonically changes 
from coolest to hottest. No hard boundary has to be set in the stellar locus to artificially separate the 
stars into discrete classes. The users who want to select specific stars for their statistical studies on the 
Milky Way can simply cut the data from any segment of the stellar locus. 
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On the other hand, the SVM based classification assigns discrete MK type labels to stars based on 
the prior knowledge—the SIMBAD MK class labels. The compiled MK classes in SIMBAD database 
are from lots of literatures, most of them are done by comparing the spectra with the small sample of 
the standard stars by eyes. This may raise significant inconsistency between the literatures. Calibrations 
among different literature seems very difficult, since the MK classes are not continuous but discrete. 

The realistic issue for large spectroscopic surveys, such as the LAMOST survey, is that millions of 
the stars are observed and it is impossible to inspect each spectrum by eyes. As shown in the exercise of 
SVM classification in section |4] the state-of-the-art machine learning techniques may not very helpful 
because they need to be trained by the prior knowledge which should be accurate and self-consistent. 

Based on this analysis, we therefore suggest the LAMOST users to employ the line indices stellar 
locus, rather than directly use the derived MK classes from the catalog, to select the proper types of stars 
to meet their specific request. If the users want to compare their sample with literatures, which may use 
MK classes, they can quantitatively calculate the percentage of completeness and contaminations via 
the comparison of the stellar loci with SIMBAD and the SVM MK classes. 

6 CONCLUSIONS 

In this paper, we revisit the fundamental issue of the stellar classification using 3,000 high signal-to- 
noise ratio LAMOST spectra with known MK classes obtained from the cross-identification of SIMBAD 
database. Although the MK classes have been widely used for more than 70 years and become a stan¬ 
dard, it seems not easy to adapt the large amount data from precent-day spectroscopic surveys. The MK 
classes are constructed based on a very small sample of standard stars, which are mostly very bright and 
located in the local volume nearby the Sun. New spectroscopic surveys, e.g., SDSS and LAMOST, can 
detect the deep sky as far as 100 kpc and hence contains millions of stars from very different populations 
with the solar neighborhood. The current standard star library then becomes incomplete compared with 
a few orders of magnitude larger survey data. Another issue is that almost all stars with known MK 
classes are classified by eyes. It is unfortunately impossible in the era of large data. The third issue is 
that the MK classes are discrete, which make it difficult to be calibrated. 

We map the MK classes into the space of line indices and find that the stellar loci in the lines indices 
can well describe the MK classes. Moreover, it is naturally along the change of the effective temperature. 
For the late type stars, the different luminosity types can also be disentangled in the stellar loci. 

We then investigate the performance of an automatic MK classification based on the SVM tech¬ 
nique. We find that although A, F, G, and M types of stars can be well classified, almost half of the B or 
K type stars are mis-classified. 

We therefore suggest that the classification of the stars should be based on the continuous stellar 
loci in line indices. The advantages of the stellar loci are that 1) they are continuous and one can cut 
a group of data at any point on the loci; 2) the stellar loci is consistent with the effective temperature; 
and 3) after selecting a group of stars from the stellar loci, one can easily estimate the completeness and 
contamination of the sample in terms of MK classes. 
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Table 1 The number of the MK classes for the LAMOST-SIMBAD sample 


■type 

Total 

V 

iv/m 

11/i 

Type 

'Total 

V 

lV/111 

11/i 

05 

1 

1 

0 

0 

F7 

15 

12 

2 

1 

07 

2 

2 

0 

0 

F8 

55 

38 

15 

2 

08 

1 

1 

0 

0 

F9 

22 

11 

11 

0 

09 

4 

3 

1 

0 

GO 

435 

398 

36 

1 

BO 

14 

9 

5 

0 

G1 

21 

19 

2 

0 

B1 

15 

8 

7 

0 

G2 

57 

33 

24 

0 

B2 

19 

11 

8 

0 

G3 

16 

9 

7 

0 

B3 

9 

7 

0 

2 

G4 

21 

17 

2 

2 

B4 

9 

7 

2 

0 

G5 

280 

218 

62 

0 

B5 

34 

18 

15 

1 

G6 

23 

11 

12 

0 

B6 

3 

2 

0 

1 

G7 

17 

7 

10 

0 

B7 

23 

13 

10 

0 

G8 

224 

114 

109 

1 

B8 

75 

65 

10 

0 

G9 

27 

8 

19 

0 

B9 

175 

157 

18 

0 

KO 

183 

89 

74 

3 

AO 

420 

386 

30 

4 

K1 

56 

13 

31 

0 

Al 

67 

63 

4 

0 

K2 

83 

38 

33 

0 

A2 

186 

175 

10 

1 

K3 

25 

15 

7 

0 

A3 

61 

60 

1 

0 

K4 

25 

17 

8 

0 

A4 

11 

10 

0 

1 

K5 

21 

14 

5 

1 

A5 

43 

39 

2 

2 

K6 

9 

9 

0 

0 

A6 

3 

2 

1 

0 

K7 

11 

11 

0 

0 

A7 

27 

21 

6 

0 

K8 

5 

5 

0 

0 

A8 

12 

10 

0 

2 

K9 

2 

2 

0 

0 

A9 

1 

0 

1 

0 

MO 

21 

12 

9 

0 

FO 

53 

34 

16 

3 

Ml 

6 

3 

3 

0 

FI 

4 

1 

3 

0 

M2 

13 

9 

4 

0 

F2 

36 

27 

8 

1 

M3 

12 

7 

5 

0 

F3 

14 

10 

4 

0 

M4 

10 

9 

1 

0 

F4 

7 

5 

1 

1 

M5 

4 

2 

2 

0 

F5 

67 

54 

12 

1 

M7 

2 

0 

1 

1 

F6 

36 

21 

14 

1 

M8 

1 

1 

0 

0 
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Table 2 Line indices definition 


Name 

Index Bandpass (A) 

Call K- 

3927.7-3939.7 


4083.50-4122.25 

CN'= 

4143.375-4178.375 

Ca4227'^ 

4223.500-4236.000 

04300'= 

4282.625-4317.625 


4319.75-4363.50 

Fe4383'= 

4370.375-4421.625 

He4388 

4381-4399 

Ca4455'= 

4453.375-4475.875 

He4471 

4462-4475 

Fe453L 

4515.500-4560.500 

He4542 

4536-4548 

Fe4668'= 

4635.250-4721.500 


4847.875-4876.625 

Fe5015'= 

4977.750-5054.000 

MgL 

5069.125-5134.125 

Mga" 

5154.125-5196.625 

Mgi," 

5160.125-5192.625 

Fe5270'= 

5245.650-5285.650 

Fe5335'= 

5312.125-5352.125 

Fe5406'= 

5387.500-5415.000 

Fe5709'= 

5698.375-5722.125 

Fe5782'= 

5778.375-5798.375 

NaD" 

5878.625-5911.125 

TiOL 

5938.375-5995.875 

Ti02‘= 

6191.375-6273.875 


6548.00-6578.00 


Pseudocontinua (A) 
3903-3923 4000-4020 


4041.60-4079.75 4128.50-4161.00 

4081.375- 4118.875 4245.375-4285.375 

4212.250- 4221.000 4242.250-4252.250 

4267.625- 4283.875 4320.125-4336.375 
4283.50-4319.75 4367.25-4419.75 

4360.375- 4371.625 4444.125-4456.625 
4365-4380 4398-4408 

4447.125- 4455.875 4478.375-4493.375 
4450-4463 4485-4495 

4505.500- 4515.500 4561.750-4580.500 
4526-4536 4548-4558 

4612.750-4631.500 4744.000-4757.750 
4827.875-4847.875 4876.625-4891.625 

4946.500- 4977.750 5054.000-5065.250 

4895.125- 4957.625 5301.125-5366.125 

4895.125- 4957.625 5301.125-5366.125 

5142.625- 5161.375 5191.375-5206.375 
5233.150-5248.150 5285.650-5318.150 

5304.625- 5315.875 5353.375-5363.375 

5376.250- 5387.500 5415.000-5425.000 

5674.625- 5698.375 5724.625-5738.375 

5767.125- 5777.125 5799.625-5813.375 

5862.375- 5877.375 5923.875-5949.875 

5818.375- 5850.875 6040.375-6105.375 

6068.375- 6143.375 6374.375-6416.875 
6420.00-6455.00 6600.00-6640.00 


^ Beers et al. 119991 
^ Worthey & O ttavian i |l997l 
Worthey et al. 119941 
Cohen, Blakeslee & Rvzhov [T998l 


Table 3 The median locus in the space of equivalent width of the spectral lines for main- 
sequence stars. 


Type 

t:w GAZOO 

A 

EWh^ 

A 

A'IVms 

A 

A'M/Fe 

A 

t!WTi02 

A 

Number 

of stars 

06-9 

-0.27±0.23 

2.55±0.23 

0.46±0.12 

0.28±0.43 

-0.00±0.28 

6 

BO-3 

-1.07±0.45 

4.20±1.72 

0.22±0.12 

0.10±0.35 

0.03±0.49 

35 

B3-6 

-1.60±0.59 

6.76±1.62 

0.12±0.11 

0.35±0.28 

-0.04±0.33 

34 

B6-9 

-2.50±1.04 

11.43±2.68 

-0.01±0.28 

0.49±1.96 

-0.01±5.37 

237 

AO-3 

-2.52±1.32 

12.45±2.65 

0.08±0.58 

0.49±0.90 

-0.05±0.43 

684 

A3-6 

-1.29±1.01 

11.07±2.09 

0.51±0.33 

0.62±0.29 

-0.11 ±0.40 

111 

A6-9 

-0.27±0.80 

8.88±1.86 

0.74±0.30 

0.72±0.30 

-0.27A83.98 

33 

FO-3 

0.89±1.04 

5.64±2.28 

1.01±0.38 

0.89±0.31 

-0.21±7.70 

72 

F3-6 

2.18±1.20 

2.90±2.75 

1.19±0.36 

1.26±0.56 

-0.17±7.61 

90 

F6-9 

3.42±1.11 

0.87±2.16 

1.49±0.39 

1.62±0.50 

-0.15±3.48 

82 

GO-3 

5.38±1.08 

-3.15±2.29 

1.98±0.61 

2.59±1.93 

-0.03A0.48 

459 

G3-6 

5.86±0.98 

-4.39±2.45 

2.44±0.67 

3.32±2.42 

0.05A0.49 

255 

G6-9 

6.21±0.72 

-6.45±1.97 

2.88±0.76 

4.41±1.45 

0.19A0.46 

140 

KO-3 

6.28±1.35 

-7.72±3.53 

3.23±0.91 

5.23±2.42 

0.45A2.52 

155 

K3-6 

5.92±0.84 

-10.45±2.78 

4.25±0.55 

11.63±2.49 

1.96±11.10 

55 

K6-9 

5.12±0.94 

-10.07±4.24 

4.14±0.71 

12.68±1.33 

5.08A15.70 

27 

MO-3 

3.52±1.31 

-9.21±3.28 

3.26±0.81 

11.78±0.91 

21.30±15.85 

31 

M3-6 

2.72±1.02 

-11.57±5.31 

3.01±0.45 

11.11±2.66 

33.69A20.42 

18 
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Table 4 The median locus in the space of equivalent width of the spectral lines for stars with 
luminosity types IV or III. 


Type 

LWg4300 

A 

EWh^ 

A 

AWms 

A 

EWf. 

A 

iiWTi02 

A 

Number 

of stars 

B()-3 


2.78±1.78 

0.34±0.21 

0.09±0.33 

0.07±13.83 

20 

B3-6 

-1.55±0.59 

6.70±1.33 

0.17±0.21 

0.16±0.32 

0.01±0.34 

17 

B6-9 

-1.85±1.04 

8.36±2.35 

0.04±0.35 

0.50±0.34 

-0.02±13.38 

38 

AO-3 

-1.61±0.95 

11.03±1.97 

0.24±0.40 

0.41±0.36 

0.03±0.32 

45 

A3-6 

-0.89±1.12 

10.41±1.75 

0.55±0.28 

0.50±0.20 

-0.35±0.08 

4 

A6-9 

-0.83±1.16 

10.02±2.17 

0.63±0.32 

0.63±0.23 

-0.22±0.18 

8 

FO-3 

1.09±1.10 

5.50±2.58 

1.14±0.21 

1.04±0.36 

-0.24± 14.86 

31 

F3-6 

2.46±0.91 

2.41±1.80 

1.34±0.31 

1.30±0.45 

-0.11±11.07 

31 

F6-9 

3.62±1.03 

0.61±2.01 

1.61±0.33 

1.56±0.50 

-0.10±2.06 

42 

GO-3 

5.05±1.28 

-2.50±2.91 

2.07±0.81 

2.50±1.49 

0.03±9.27 

69 

G3-6 

6.27±1.17 

-5.71±2.78 

2.76±0.78 

3.49±1.42 

0.27±0.61 

83 

G6-9 

6.93±0.91 

-7.72±2.15 

3.40±2.55 

4.10±3.74 

0.80± 10.62 

150 

KO-3 

6.90±0.88 

-8.99±3.16 

3.99±0.77 

5.77±2.14 

1.32±4.82 

189 

K3-6 

6.67±0.93 

-10.01±3.38 

4.47±0.80 

8.38±2.20 

3.18±3.80 

24 

MO-3 

5.79±1.42 

-9.16±3.06 

5.37±0.87 

9.75±1.87 

27.04±13.31 

21 

M3-6 

3.59±0.99 

-4.85±9.35 

6.54±0.67 

6.84±1.67 

42.62±6.16 

8 


Table 5 The confusion matrix in percentage of the SVM-based MK classification 




SIMBAD 




OB 

A 

F 

G 

K 

M 

contamination 


OB 

52.60% 

6.97% 

1.94% 

0.00% 

0.00% 

0.00% 

24.06% 


A 

44.79% 

90.38% 

8.39% 

0.53% 

1.43% 

0.00% 

21.83% 

SVM 

F 

1.56% 

1.68% 

72.26% 

3.57% 

0.95% 

0.00% 

22.22% 


G 

0.52% 

0.48% 

17.42% 

90.91% 

43.81% 

2.86% 

19.43% 


K 

0.52% 

0.48% 

0.00% 

4.99% 

52.86% 

28.57% 

26.97% 


M 

0.00% 

0.00% 

0.00% 

0.00% 

0.95% 

68.57% 

7.69% 
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Fig. 1 The figure shows the line indices for different MK classes. Each grid in the x-axis cor¬ 
responds to a spectral line and y-axis indicates the median equivalent width for each line. The 
colors and symbols codes the O (black crosses), B (blue pentagons), A (cyan large circles), 
F (green triangles), G (orange small circles), K (red hexagons), and M (Magenta rectangles) 
types. 
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Fe Fe 


Fig. 2 The contours show the distribution of the LAMOST stars in the space of line indices. 
The top-left, top-right, middle-left, middle-right, bottom-left, and bottom-right panels are for 
the H-y vs. Fe, vs. G band, vs. Mg, Mg vs. Fe, Fe vs. G band, and Fe vs. Ti02 planes. 
The dark lines with circles indicate the stellar loci of the main-sequence stars with the MK 
marks from the SIMBAD database, while the RED dashed lines with asterisks indicate the 
stellar loci of the giant stars (type IV/Ill) with the MK marks from the SIMBAD database. 
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Fig. 3 The panels are same as their counterparts in Fig |2] but are zoomed in to the details 
around the locus of the early type stars 
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Fig. 4 The distribution of the SIMBAD MK classes of the test data for SVM in the space of 
line indices. The top-left, top-right, middle-left, middle-right, bottom-left, and bottom-right 
panels show the distributions in vs. Fe, H.y vs. G band, vs. Mg, Mg vs. Fe, Fe vs. 
G band, and Fe vs. Ti02 planes. The marginalized distributions of one single line index for 
different spectral types are shown at the right and top edges of each panel. The colors and 
symbols codes the OB (blue pentagons), A (cyan large circles), F (green triangles), G (orange 
small circles), K (red hexagons), and M (Magenta rectangles) types. 
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Fig. 5 The distribution of the S VM derived MK classes of the test data in the space of line 
indices. The top-left, top-right, middle-left, middle-right, bottom-left, and bottom-right panels 
show the distributions in H.y vs. Fe, vs. G band, H..y vs. Mg, Mg vs. Fe, Fe vs. G band, 
and Fe vs. Ti02 planes. The marginalized distributions of one single line index for different 
spectral types are shown at the right and top edges of each panel. The colors and symbols 
codes the OB (blue pentagons), A (cyan large circles), F (green triangles), G (orange small 
circles), K (red hexagons), and M (Magenta rectangles) types. 
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Fig. 6 The stellar loci for main-sequence (luminosity type V) stars calculated from the me¬ 
dian location of each subtype in the space of line indices. The top-left, top-right, middle-left, 
middle-right, bottom-left, and bottom-right panels show the loci in H-y vs. Fe, H-y vs. G band, 
vs. Mg, Mg vs. Fe, Fe vs. G band, and Fe vs. Ti02 planes. The dark lines with circles 
indicate the stellar loci of main-sequence stars from LAMOST spectra, while the red lines 
marked with the effective temperatures show the main-sequence locus of the MILES library. 
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Fig. 7 The stellar loci for giant (luminosity type IV/III) stars calculated from the median lo¬ 
cation of each subtype in the space of line indices. The top-left, top-right, middle-left, middle- 
right, bottom-left, and bottom-right panels show the loci in vs. Fe, H..y vs. G band, vs. 
Mg, Mg vs. Fe, Fe vs. G band, and Fe vs. Ti02 planes. The dashed lines with asterisks indi¬ 
cate the stellar loci of giant stars from LAMOST spectra, while the red dashed lines marked 
with the effective temperatures show the giant locus of the MILES library. 















































